Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks

نویسندگان

Chin-Cheng Hsu

Hsin-Te Hwang

Yi-Chiao Wu

Yu Tsao

Hsin-Min Wang

چکیده

Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios. In most situations, the source and the target speakers do not repeat the same texts or they may even speak different languages. In this case, one possible, although indirect, solution is to build a generative model for speech. Generative models focus on explaining the observations with latent variables instead of learning a pairwise transformation function, thereby bypassing the requirement of speech frame alignment. In this paper, we propose a non-parallel VC framework with a Wasserstein generative adversarial network (W-GAN) that explicitly takes a VC-related objective into account. Experimental results corroborate the capability of our framework for building a VC system from unaligned data, and demonstrate improved conversion quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adversarial Variational Optimization of Non-Differentiable Simulators

Complex computer simulators are increasingly used across fields of science as generative models tying parameters of an underlying theory to experimental observations. Inference in this setup is often difficult, as simulators rarely admit a tractable density or likelihood function. We introduce Adversarial Variational Optimization (AVO), a likelihood-free inference algorithm for fitting a non-di...

متن کامل

Autoencoding topology

The problem of learning a manifold structure on a dataset is framed in terms of a generative model, to which we use ideas behind autoencoders (namely adversarial/Wasserstein autoencoders) to fit deep neural networks. From a machine learning perspective, the resulting structure, an atlas of a manifold, may be viewed as a combination of dimensionality reduction and “fuzzy” clustering.

متن کامل

Generative Adversarial Source Separation

Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density. Generative Adversarial Networks (GANs) can learn data distributions without needing a parametric assumption on the output density. We show on a speech source separation experiment that, a multilayer perceptron trained with a Wasserstein-...

متن کامل

Whai: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling

To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes. The generative network of WHAI has a hierarc...

متن کامل